-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not merge?: Subclass Sync - Direct in source, indirect in Mondo - Mini build #717
base: subclass-sync-direct-source-indirect-mondo
Are you sure you want to change the base?
Conversation
- Ran and updated outputs
65f2659
to
a72151f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One bug (probably synchronisation), else just some interesting thing that IMO should be manually spot checked by @twhetzel in the ontology.
src/ontology/reports/ncit.subclass.confirmed-direct-source-indirect-mondo.robot.tsv
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting fact to note all by itself. Since the only source of subclass axioms in OMIM is OMIMPS, I would have expected a much shorter list. What does this mean?
There are 830 cases where Mondo injects a class between and OMIMPS and an OMIM.
I would suggest @twhetzel verifies 2 or 3 of these because that sounds a bit odd to my ears. Maybe make an issue and have Chris or Sabrina sign of on these 2 or 3 cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this to the experts, but just wanted to take a cursory glance. Here are some cataract terms.
I'm not sure what @matentzn means by injecting a class. at least in the one case I checked, MONDO:0007286
, cataract 30 corresponds to https://omim.org/entry/116300 CATARACT 30, MULTIPLE TYPES. And MONDO:0005129
, cataract corresponds to https://omim.org/entry/116200, CATARACT 1, MULTIPLE TYPES. Seems to me like the subject/objects for Mondo and OMIM are matched well here, and I don't see an (injected) intermediary class?
subject_mondo_id | subject_mondo_label | object_mondo_id | subject_source_id | object_source_id | object_mondo_label |
---|---|---|---|---|---|
MONDO:0007286 | cataract 30 | MONDO:0005129 | OMIM:116300 | OMIMPS:116200 | cataract |
More cataract cases
subject_mondo_id | subject_mondo_label | object_mondo_id | subject_source_id | object_source_id | object_mondo_label |
---|---|---|---|---|---|
MONDO:0007278 | cataract 32 multiple types | MONDO:0005129 | OMIM:115650 | OMIMPS:116200 | cataract |
MONDO:0007279 | cataract 7 | MONDO:0005129 | OMIM:115660 | OMIMPS:116200 | cataract |
MONDO:0007280 | cataract 8 multiple types | MONDO:0005129 | OMIM:115665 | OMIMPS:116200 | cataract |
MONDO:0007283 | cataract 42 | MONDO:0005129 | OMIM:115900 | OMIMPS:116200 | cataract |
MONDO:0007284 | cataract 20 multiple types | MONDO:0005129 | OMIM:116100 | OMIMPS:116200 | cataract |
MONDO:0007286 | cataract 30 | MONDO:0005129 | OMIM:116300 | OMIMPS:116200 | cataract |
MONDO:0007287 | cataract 41 | MONDO:0005129 | OMIM:116400 | OMIMPS:116200 | cataract |
MONDO:0007288 | cataract 6 multiple types | MONDO:0005129 | OMIM:116600 | OMIMPS:116200 | cataract |
MONDO:0007289 | cataract 13 with adult I phenotype | MONDO:0005129 | OMIM:116700 | OMIMPS:116200 | cataract |
MONDO:0007290 | cataract 5 multiple types | MONDO:0005129 | OMIM:116800 | OMIMPS:116200 | cataract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is this. Consider this normal case:
Mondo:1 -> OMIMPS:1
mondo:2 -> OMIM:2
Mondo:2 subclass Mondo:1
What this table is saying is that there must be some intermediate mondo class Mondo:x like this:
Mondo:1 -> OMIMPS:1
mondo:2 -> OMIM:2
Mondo:2 subclass Mondo:X
Mondo:X subclass Mondo:1
That's the definition of indirect. Else why would it appear in this table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh of course! Derp, I got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, just to continue my above example, here's further detail with the intermediate class:
id: MONDO:0007289
name: cataract 13 with adult I phenotype
is_a: MONDO:0011060 {source="Orphanet:91492/btnt"} ! early-onset non-syndromic cataract
id: MONDO:0011060
name: early-onset non-syndromic cataract
is_a: MONDO:0005129 {source="Orphanet:91492", source="Orphanet:91492/inferred"} ! cataract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs curation eyes, sorry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the cataract example that Joe posted, there is one direct path in Mondo, e.g. MONDO:0007289 'cataract 13 with adult I phenotype' subClassOf MONDO:0005129 cataract and one indirect path, e.g. MONDO:0007289 'cataract 13 with adult I phenotype' subClassOf id: MONDO:0011060 'early-onset non-syndromic cataract' subClassOf MONDO:0005129 cataract.
Since the direct path already has the subClassOf provenance of OMIM:116700, is the general expectation when these cases (a direct path and an indirect path) occur that the provenance stated in the "confirmed-direct-source-indirect-mondo.robot" file is the same as the provenance that already exists because of the direct path? How are these updates applied to Mondo, e.g. could adding information from confirmed be lost by then adding "confirmed-direct-source-indirect-mondo.robot"? Should there be a qc check so that a subClassOf source annotation does not exist for more than 1 external source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super interesting!! 4500 CASES? WOW! If this is correct this is a huge story for the Rare Disease paper! Maybe verify 2 or 3 random ones!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the achievement here? Is it just that Mondo has grown to provide super granular rare disease ontologization that a/the leading rare disease ontology, Orphanet, does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The story is that the hierarchy already available through ORDO was enriched with mir fine grained groupings! That are not in ORdo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 4,004 cases where the "indirect" path in Mondo is from the Mondo term up to MONDO:0000001 disease. The "direct path" in the source does not mean that there are also not indirect paths in the source that provide similar information, e.g. MONDO:0000023 'infantile liver failure'.
…ss-sync-direct-source-indirect-mondo-mini-build
- Update: Re-ran again, this time using the most up-to-date inputs.
Mini build for:
Google Sheet:
sync-subClassOf.confirmed-direct-source-indirect-mondo.tsv